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foreword 

e ^ " : 

Two symposia were conducted during the 1971 Annual Meeting of the 
American Educational Research Association to deal with a recent book 
entitled Educational Evaluation and Peels ion Making , prepared by the 
Phi Delta Kappa National Study Committee on > Evaluation. The first 

symposium was descriptive in nature, while the second was evaluative.. 

This report contains only the information from the second symposium, 
since the substance of the first symposium is already available through 
the Phi Delta Kappa book. 

The first symposium was a description of a two-year effort by the 

PDK National Study Committee on Evaluation to analyze problems and to 

conceptualize relevant solutions In the field of evaluation. Members 

of the PDK Committee Introduced and summarized the material contained 
" L ‘ ' . 

In the eleven chapters of their final report. This sympos^yjn was 

intended to provide the basis for the second related sympos 5 urn. in 

1 ^ 

Which experts In educational change theory, jeducational administration, 

r 

educational psychology, philosophy of science, and educational evaluation 

offered critical ructions to the PDK report on evaluatfon. 

The second symposium was chaired by Walter J, Foley, a member Qf 
- . ^ “■ ^ \ 
the PDK Study Committee. The critiquers included Henry M. Bnckell, 

Institute for Educational Development,. John C. Flanagan, American 

• Institutes fdr Research, William B. Michael, University of Southern 

California, Michael Scriven, University of California at Berkeley, 

and James L» Wardrop, University of Illinois at Utbapa-Champaign » 
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Each critique? had reviewed an advance copy of the PDK book and had 
developed a formal reaction. Copies of the critiques 1 , as edited by 
the authors, form the substance of this report. u 

As organizer of the second symposium, I wish to express apprecia* 

<r 

tion for the diligent efforts of the reviewers to provide in-depth 
reactions to the PDK report and for Walter Foley's capable chairing 

of the session. Important issues are identified among the critiques 

% * 

and should serve to further the efforts of those who are committed 
to improving both the theory and practice of educational evaluation. 

^ DLS . 



"b - 



L- 



K • 



d 

ERIC 



V 



A CRITIQUE OF THE REPORT OF THE PHI DELTA KAPPA 
STUDY COMMITTEE ON €VALUATION'" 

Henry M a Brickell 1 

The enormous scope of the Commission report is an achievement 
for its authors but, a massive problem for its* critics* ’ Where does 
one' grasp the beast to wrestle with It? For those of- you in the 
audience who have not read the Commission report, the ri$k you face 
Is that each one of us at the symposium will drag a different section 
Into the ring,, leaving the bulk of ..the creature outside the arena.. 

That will give you the Impression not that we are reporting on 
an elephant but that we have segmented a gia6t platypus and randomly 
assigned Its parts for evaluation. I wanted you to have my assessment 
of the context we are ,ln before hearing my Input into the process 
of judging the product. (The Commission's report can change your 
language.) 

The^ brc , ..mission's work out reaches any critique 

of less than 532 pages, the length of the book Itself. Its publication 
~ an action that triggers reactions. It accuses; one war to make 
ountes* accusations. It argues; one war's to argue, back. it s 1 tus-t rates; 
or 2 wants to use opposing Illustrations. <Where It asks a Question, one. 
wants to A answer» When it gives, arf answer., one wants to question. 

To set some limits around my Own comments, I will viev -his new 

creature with the syes of a practicing decision maker In a * blic ^ 

0 

Stool district, The~e are two reasons for tHi , The fin is that t 

I have spent most of my career as a decls ton-maker and havi had a good 
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chance to observe other decision-makers in action. The second reason 
Is that there is a large and growing demand for evaluation services 
for those who manage either ongoing school programs or special 

projects and who must periodically decide whether to continue, modify, 

- - ■ 

or terminate them. The obligation to evaluate ESEA Title 1 and other 
federal programs is of course a maj^r source of that demand. But 
certain new developments, such as rising community Interest in schools 
and the growth of ' performance contracting,' increase the demand for 
evaluation services, Decis ion-makers , who are the clients for. the 
kind of evaluation being proposed, will have definite opinions about 
the utility of those services. Since funds for the kind of evaluation 
being proposed by the Commission will come from those clients rather 
than from traditional research-funding sources, their reactions will 
shape the' future of the movement, . Research dan be pursued at the 
initiative of the Individual schp 1 a r, even without special funding, 
but the kind of evaluation envisioned by the Commission certainly 
cannot be. Thus, apart from the need for more detai led conception and 
a much better methodology (needs pointed to by the Commission itself) 

l , ‘ 

the reactions of decision-makers may be decisive, 

- * -v 

v Now I will react as af decision-maker* All right. Here is a body 

‘ of theory and practice that wants "to be my servant— no, my consultant 

or even more accurately, my colleague. My very first reaction Is, 

" * * 

"I have met someone like you before— In fact, quite a number of you." 

. ; 

• I* have a school psychologist who often comes in when I am trying to 
make decisions and explains that I ought to use him as a consultant. 

4 " ’ 
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He tells me that the school is a wholly huTsan enterprise, that he 
special izes in human behavior, and that In as mueh as my decisions 
deal with people he ought to guide me. The last time he was In I had 
to break off for an appointment with the curriculum coordinator, who - 
explains that since instruction is the central function of the 

school, .ought to let him advise on all my major decisions. The 

day before, my business manager' had reminded that we run the place 
on money , that every decision . mike has a price tag, and that he-can 
keep us in the black only if I will bring him in on my decisions in. 
advance. 1 didn't know whether to be a» re Impressed with that or with 
what our community relations man had said about how he could keep us 
out of needless trouble with the activists If .1 would check with h.m 
beforehand on how my decisions would be received in the community, x 
I had left the community relations specialists to meet with the 

building principals, who complain that I spend too mich time listening 
* to the centra, office staff. The real work of the system takes place 
out in the' school buildings, they say, and they as principals have, 
the best vantage point for helping n* make decisions.^ The teacher^ 
union has negotiated Itself a chair on my side of the desk *o that . 
,t can keep me acquainted In advance with how my decisions will be 
taken by teachers, which the union explains is only simple justice 
since the teachers are the people wh* must carry out decisions. 

Once I thought I had command of every one's territory, 
realize that they all have command of mine 0 
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But 1 must admit that you are especially intriguing to me, you 
new-breed evaluator. The immediate reason happens to be tte 1 have 

" just hired a planner. -He has explained hi s lob to me. .1 have learned 

' * , — , 0 - 

/ from him that planning* Is the central .decision-making function and 
' that he will thus 1 'sp t end most of.his time helping me make decisions. 

1 am not surprised; But what interests me most is that he wdnfcs to 
help me \) set goals, 2^ choose among alternative courses of action he 
has generated, 3) allocate people and money to the chosen course ot 
action; and b) use the results to plan better next time. Obviously, 
•not only are both of you in my territory, which is already crowded, 
y0 u are also in each other’s. v So hot only can I welcome, you to the « 
club, 1 even know who to give you for a roommate* Maybe the two of 
you can work out which one is going to help me do what. 

* Before 1 learned about your concept of evaluation, I- had a 
fairly simple picture of how to use, both a planner and an evaluator. 

{ would send the planner out of one door Of my office v^Ith a roll of 
' plans under his arm and eventually you, as an evaluator, would come 

• the other door with an evaluation report on- how the pi ans,-had 

worked out. You could meet the planner at my. oesk. But I realize 
now that he has stretched himself so far forward that he Is standing 
at your door and you have stretched yourself so far | backward that you 
Ire standing at his. I have had to deal with role cohf 1 let and now I 
ca n see I am going to have to deal with r6le overlap. 

I have some other reactions as well. You renilnd me for some , 
reason of the action research movement. As Max Corey described ,'t s 
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.action research in the hands of the classroom didn’t sound to me much 

^ ■ ; 

/ , • ' 

like research^' but. it did sound, 11 ke Intel 1 1 gent action. For 
example tf the oeachei could tell that things were going wrong she 
was supposed to ctisnge them right then --riot, keep on going so she 
could accumulate a solid mass of dismal results at the" .05 or 
preferably, ,01 level." That made sense;, that* s exactly the way i do 

.v s'* * < < ‘ . »■ 

things, as a decision-maker.' You do seem to understand that, unlike 
most of the evaluators I have met, J^am not goirrg tb hold things 
steady just so you can evaluate! them. But 1 have always felt the 
evaluators believed 1 was somewhat sloppy* that they couldn't help 
me If 1 wou l dn 8 1 play the game their way* and that.lt wa s s all my 
fault* If you could really help me without getting in the way and 
cramping my style while 1 am trying to run with the ball*^ then l*ro 
But 1 have enough ^ther hurdles to jump over ,wi thout 
having to* clear some .extra ones that you set up* 

* .You “rami nd me also for some reason of the new curriculum ^packages 

*. * j * ^ • • 

where the examination doesn*t come at the end of . the course but Is 

x * e ‘ 

scatteVed in pieces throughout, lesson by lesson or unit.vby unit so 
that' the teacher can tell how things are going— even child by child— 
without waiting until it’s too late. That makes sense. If you 

can do something 1 ike that about the decisions I haye to make in 

■ ■ : • ' 1. y—> . 

my^office. I’m interested. < » . ’ * 

‘ One tiling I may as well tel 1 yoU quite frankly. I wouldn’t even 
be considering you for a jbb if ! didn’t ijave these federal programs 

, " ;i 
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that 1 have to evaluate. Yo«r salary Is going to come right out ^ 
of that evaluation money and you are -going to have to keep the state 
officials satisfied that our TiUe I projects are successful so 
that we can keep *>n get ti na the funds. 1 am willing to change 
projects that don't succeed, you understand, but vhat you have^to 
know Is that the state people are mainly i^nter^sted in tested 
pupil performance — Washington pushes them. that way, of course and 
they don't want to settle for anything else. If your approach 
ts go 1 ; . j to add something to .the evaluation of that ultimate 
product“”puDl 1, 1 ea rni ng~"*but not try. to substitute something for> 
i t , then I’m 1 nte res ted . But we Have to keep the state peop » e 
happy b and § have to be sure t;jiey will fettle for your method. 
Something: 1 don't fully understand is how you are going to use 
the findings of th»; resea rck done, elsewhere, particularly .the research 
that is genera^i table to my situation. .Are you saying that when 1 
put in something that has been^proven successful by previous research 
elsewhere that f' still have to pay to hdve it evaluated all- over 
again In my district? Why canl.t we use those other results? 

The things you. are "talking about doing- sound pretty expensive. 

, S’ * * - - ” * 

But If t understand you correctly,- we don't have to evaluate 

< .. . f" 

everything. At least, we don't have to put everything through 

the full evaluation cycle. We could evaluate only the major changes. 

, we could evaluate all the changes. Or, If I coul,d afford It, 

A. ’ ' 

-*u 
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we could evaluate everything whether It has ,1 been changed or not. 

That decision would be up to me and it would depend on what we 
could afford. Right? 

One thing I am very concerned about is that you not go around 
turning up a> lot of trouble. 1 think things are going pretty well. 
Anyhow I hope so— for several reasons. One is that the staff is 
•working pretty hard under difficult conditions and \ atn not interested 
in having you bring in a critical report every couple of weeks 

t 

finding fault with something. The teacher and principals need 
encouragement and a sense of success more than anything else. - 
In addition to that, it would not make a very good impression on 
the Board if you found trouble in every corner. One big reason 
I hope you are not going to find many things wrong is that 1 

( N 

can't keep ori changing everything all the time. First the 
district can't afford It and second, I don't want^the place in 
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constant turmoil. So, If your approach can locate success as well 
as failure— and not show that nothing we try ever works, like most 
of our past evaluation consultants have foCfnd, then 1 am interested. 
At the very least, you wi \ 1 have help me rank any problems you 
find so that I can solve the worst of them and let. the others go 

•by. * . - . ’ , 

1 have been paying a little attention to the performance 
contracting movement.. We are not really interested .right. now • 
and 1 want to see how \it goes 1 n ( bthe r districts first, but I - 

~ \ * -• "...I - - 
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would like to know whether your approach could put the finger on 
things that we would be better off contracting out to some other 
agency- -•or, maybe even to our- own teachers union. The Board .of 
Education would certainly be interested In that. —y \ 

One other thing you have to understand. 1 may not always 
'take your advice* even if yo^i think It's good. I have other things 
to think about. Let me give you an example. Last October an 
evaluation consultant 1 had hired brought in a report on our 
paraprofessionals for the previous year. He was able to show 
pretty convincingly that paraprofessionals weren't having any effect 
on the test scores of the elementary children, which we have been 
hoping they would. He said : hls findings forced him to recommend 
that th.e paraprofessional program be terminated in favor of something 
else. Great. Great advice. That's all he knew about it. What 
he didn't know was that we were In a bit of a recession in this city 
last fall. All 1 needed to do was to drop those minority-group 
paraprofessionals from the payrol 1--al 1 of them live right around 
the schools where they work, have kids in school, and a lot of 
contacts in the neighborhood. Fire them all during that recession 
and we would have had - something close to an armed revolution. So 

naturally, 1 still have the program going on just as before. I 

- . * ' - 

can't use advice like what he gave. Now, if there is something in 
the way you do evaluation that can take into account all aspects 
of the problem, then I'm interested.- 
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So much for my reactions as a practicing decision-maker* Let 

me step out of his shoes and comment on the Commissions work as 

an evaluator, which I am from time to time*' As an evaluator, I 

would welcome something that would enable me to stop saying to the 

deci sion-makers that come to me: 

/■ 

“Too late. You should have come to me tong before you started 
this program. It's Janrary, and you're been underway since Septemb* *•. 
Your achievement ter."'-; program doesn’t contain anything I can 
use as a post-test, ... ch ess a pre-te^t. -You didn't even specify 
iupil behavioral objee •* ?, when you started tf* s International 

Exchange Program fc. Tea .hers. Moreover, it looks like everybody 
who could benefit from it is already in the program so we don’t 
even have a control group. . Sorry; I can't help you," 

Well, I want to help, i don't want to be a sorry researcher If 

< ' ... 

I can be a useful evaluator. 

In summary, the work of the Commission represents a promising 
way of bringing disciplined inquiry into the service of the decision- 

I , 

maker, something researchers have had great difficulty In doing. The 
amount of Intelligence and har^ work applied by the Commission 
surely will advance us toward that objective. 

Certainly we need something between the mindless and rapid 

•> . /■- 

evaluations performed in the early days of Title ! and the excessive 
reactions to them which have government officials today expecting 
us to shbw that if a Teacher Corps Trainee goes to a good lecture 

11 
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as a college junior in 1971 , her pupils in ’973 will get higher 
achievement test scores as a result. 

The Commission report itself is horn ' trying to point out 

its own shortcomings. One of the best cri T c jes s Os self- 

r 

examination contained in the final chapter ~ - tr - vo . ne. 

What I admire most i£ the Commission’s -t'ss ess t daring to 
call for work which requires the invention o' r met odology — 
rather than inventing new work to fit the av-l’ le ire :hodology . 




A CRITIQUE OF THE MEASUREMENT AND INSTRUMENTATION ASPECTS 
OF EDUCATIONAL EVALUATION AND DEC I S IQN^MAKIJ NG 

> ~ ✓ / 

John C. Flanagan 

As one of a panel of Five reactors to this .repc f Feel a little 
bit like one of the five blind men describing the elep ant. Unlike 
the blind men, my col leagues . and I all see this elephant, but our • 
descriptions are likely to be rather dissimilar because of our 
special fields of interest and our varied previous experiences. Thus# 
the descriptions, although all based on the seme 532-page 3- ton 
elephant# can be expected to be quite different. 



r 

In my description the emphasis will be on techniques of measurement, 

, / . - < ' * 

data collection, and the central role of the Individual student in 

• N ' 

evaluation activities. Before proceeding to specific points# some 
general impressions see<n In order. First, the report is comprehensive, 
detailed, and analytical^ It analyzes evaluation Into stages occurring 

V ) ' . ■ 

In various settings, having various scopes# and providing Information 
relevant to various types of decisions* it Is thorough and systematic 
and provides, a very good framework. Of course# tt Isn't* the. book I 
would have written because it doesn't really do much for providing 
measurement techniques, at least not new ones, or even a real good 
review of what we do have, and of course it doesn't center on /the 
Individual as the evaluation unit in the way that I would like to 

see 1 t^'a! though there Is mention of this. ‘ 

* 

13 
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The report is based on a specific definition of evaluation 
which Is; "Educational evaluation Is the process of delineating, 
obtaining, and providing useful information for judging decision 
alternatives,' 

« 

This derfi iltion is followed by a discussion of four stages in 
the process of decision-making including seventeen specific elements,. 

In addition to this study of the decision process, there Is a detailed 
description of possible decision settings, decision models, types 
of decisions, and some problems related to decision-making*, In this 
chapter and the chapters which follow on criteria, values, information 
and systems theory, and evaluation methodology * the emphasis seems 

Y - ' 

to' f be on delineating and discussing ell possibilities rather than on 
the practical «• side of the conduct of educational evaluation# 

In Chapter Seven* the four types of evaluation are presented 
together with a general model for conducting any one of these types 
/ of evaluation, the three steps which are proposed for all types 
of evaluation are delineating* obtaining* and providing# 

The four types of evaluation are: ^ j ' 

First. Context Evaluation which has as its purpose 11 to provide 
a rationale for determination of object Ives#. Spec! f leal ly* it defines 
the relevant environment* describee the desired end .actual conditions 
pertaining to that environment* Identifies unmet needs and unused 

opportunities* and diagnoses the problems that prevent needs from 

► 
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h^ngmet and opportunities from being used* The diagnosis 
problems provides an essential basis for developing objective' those 
achievement will result in program improvement," 

The authors state that context evaluation is the mos ba : 
kind of evaluation* The authors divide context evaluation int two 

a 

modes - contingency and congruence B in the contingency made ealuaton 
Searches for opportunities to improve ^he system by changing toe 
objectives* The congi^enee mode evaluates the extent to wh id- 
intended objectives are ach I eved « 

This reviewer strongly endorses the emphasis on this typ of 
evaluation and the distinction between the two modes for stud, ng 
the objectives of ,the system. The discussion, however, seems to lack 
sufficient emphasis on needs and opportunities with respect to individual 
students and, although there is an emphasis on broad exploratory probing. 
It appears desirable that there be more specific, provislon^for 
unplanned outcomes and the achievement of unintended objectives as 
well as those intended for the system* 

The second type of ^vaiution, Input evaluation . Is intended 
to provide the basis for selecting a design to achieve program objectives. 

n - . . t> 

This involves the study of relevant capabilities, strategies for 
achieving objectives, and basic specific designs for implementing 
a proposed strategy* 

f 

The authors point out that "techniques for input evaluation 
are lacking In education," One available technique which appears 

15 
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applicable and is not discussed is the m^^hod of explicit 
rat ionaleSo 

The third type of evaluation^ process evaluation * is intended „ 

“to provide feedback to persons responsible for implementing plans 

and procedures * H To a substantial degree* what chese authors have 

included in process evaluation has come to be known as formative # 

evaluation following the terminology of Michael Scriven. The objec- 

tives of process evaluation are; to monitor the Implementation of 

the design s to provide Information needed for planned decisions 

during the Implementation phase, and to maintain a record of the 

-extent to which the project is actually lmplemsnted x as designed* 
k'; t _ '""■■■ * . - 

This type of evaluation is clearly of great importance, ■ - 

The fourth type of evaluation, product evaluation , measures and 
Interprets the extent to which objectives were achieved. The > 

criteria which are measured to perform this evaluation are classified 

\r ' 

as either t nstrumental or consequential following Scrtven's terminology 

4 ' 

Instrumental criteria refer to what have been frequently called 
intermediate criteria* Consequential criteria are those usually 

« ' t 

called ultimate criteria. 

The authors point out that "in the assessment of objectives 
relating to adoption, product evaluation and context evaluation 
ultimately merge in the measurement of the impact of the total 
change effort on the overall system. Context evaluation then takes 
on the systematic functions of monitoring the total system and the 
ad-hoc product evaluation is terminated." 

16 
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5n a later section. the authors state "product evaluation assesses^ 
attainments of changfe^n-ojects within a system, and context evaluation 
assesses the. impact of the obtained change on the total system* 

This distinction between context and product evaluation seems to be 
a, useful one* In their general discussion of the features of their 
evaluation model, the authors again emphasize the basic Importance 
or context evaluation and the need for a much more comprehensive data 
base to perform this function. Unfortunately, they do not seem to 
go far enough In developing specifications and procedures for 
collecting this very important data base. Educational - systems have 
continued to operate w|th very little attention at either the local 
or national level to the study of tbe needs of Individual students. 

The authors of this study have inserted two or three paragraphs 
suggesting the use of the individual student as the unit of measure 
in evaluation studies. The remarks are relevant and valuable. It 
wowld be desirable If their implications were carried through more 
fully in the subsequent discussions of implementing evaluation 
programs. 

The later chapters of the report on Implementing and administer'* 
ing evaluation programs need to be supplemented by handbook materials 
on what data to collect to study the needs and opportunities of 
the total educational system especially as it relates to the individual 
student. Some of the- procedures used In recent years to obtain such 
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data Include intensive case studies of students on a sampling 

<:• 

basis; follow-up studies of recent graduates to determine the utility 

JE » 

of the knowledge and abilities achieved' in school; and intensive 
studies of adults In various roles and activities to determine 
the specific educational objectives which would have been most appro- 
priate ^for them during their study programs In school. These types 

.. t _> 

of data are not mentioned in the book. 

There is probably no more important problem In education at 
the present time than determining the educational objectives for 



each individual student. It is believed that during the later 

educational years* much of the responsibility for these decisions 

\ 

should be given to the student. To prepare him for taking such 

responsibility it is believed that one should start in the primary 

: vv 

grades by giving students some responsibility In planning and carrying 
out the i r educational programs. This will necessarily be limited in 
the early years, but the ability to take responsibility requires 
much practice. 

This will require that the student know the specific knowledges 
4nd abilities required for many adult roles and activities. He must 
also know something about the nature of learning and individual 
differences and be eble to estimate the extent of effort required 
for him to achieve a specific level of proficiency with respect 
to various types of content or ability. To assist the student 
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In formulating his long-range educational and occupational goals the 
behavioral scientist needs detailed and extens 1 ve studies of students 

- 0 - t . ' ' \ 

both during and following their exposure to specific educational 
experiences which can be made available to current students as a 
basis for making their decisions. A minor point regarding the present 

V - ' r 

report Is that In view of these authors such behavioral scientists 

' f . „ • . 

are clearly functioning as evaluators In providing the basis for- 
Individual decisions; however^ their functions appear to be broadly 
those of the behavioral pcldntist and not specifically those usually 



considered as appropriate for an evaluator. v -' 

To sum up this review of educational evaluation and decision- 
making as presented by these authors, the first point to be noted Is 
, that the definition selected by o these authors Includes only one type 
of evaluation In education and therefore should not be thought of as 
the only function of evaluation methods In the educational field.. 
There are many Instances In which evaluative data are very desirable 
even though no dec Is tons have been defined apd no actions are 
anticipated. However, for purposes cf~declslon oriented educational 

V ^ t , 1 

evaluation, the report has much to commend It. The efforts of the 
seven members of the Phi Delta Kafcpa Commission on Evaluation tr 
represent an Important step forward In Increasing our understanding^ 
and ability to conduct effective educational evaluation studies. As 
J the authors point out, this Is only the beginning of an Important 
effect to Improve our educational programs. • 

19 ‘ 
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A CRITIQUE OF THE METHODOLOGY OF EVALUATION !N THE PDK 
VOLUME. EDUCATIONAL EVALUATION AND DECISION MAKING 

s William 8. Michael 

The PDK volume affords probably the most comprehensive and 
penetrating conceptualization of educational evaluation and 
decision making current^ available. The CIPP (context-lnput- 
process-prdduct) Model can use much of existing research methodology, 
particularly in product evaluation, but does require new methodology 

v <r ... . 

especially In reference to. the context, input, and process components 

that tend to be somewhat more closely associated with formative ovalua- 

•>. » 

tion than with product evaluation. 

The. modest stance which the PDK Evaluation Study Committee 
L members group have taken and their receptivity to suggestions undoubtedly 
mean that ^he re will be ample opportunity for the work group to make 
x Improvements and to move forward in developing the kinds of methodology 
„ that wilt be needed. They seem to be neither Inflexible in their 
orientation to evaluation nor resistant to suggestions. For this 
kind of openmi ndedness they are to be 

Onp may look at the evaluation methodology relative to the 
• CIPP Model from two standpoints; (l) the feasibility of the CIPP 
V Model given, current research methodology, and (2), the need for new 

methodology given the CIPP approach to evaluation. After consideration 
of each of these "two broad topics, some recommendations will be set \ 

-7- J . 
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forth which essential ly will summarize many of the comments that 

• '• ‘ • • ' *i- ‘ ■ . 

are made In the eri tique 'of the first two categories* 

Feasibility of the CIPP Model Given Current Methodology 

Relative then to the first category, the feasibility of the 
CIPP Model given current research methodology, these comments may 
by offered; 

I* I n summat f ve evaluation of products, as In accountability 
studies, classical research methodology involving use of an experimental 
and control group still affords a viable approach provided that evaluators 
can be placed in positions of influence and power in funding and 
governing agencies to allow them. to require certain approved research 
designs — especially those designs incorporating use of randomization 
in large-scale ©valuation studies* Such large-scale studies are 

<T 

expensive, but the educational enterprise Is also expensive. Such 
carefully planned investigations carried out under relatively well 
controlled conditions would permit the formation of causal inferences 
and generalization as well as valid methods for determining cost- 
effectiveness in accountability studies. 

2. Relative to existing research methodology, several other 
points may be noted. j - 

a. A great deal of existing research methodology can be 
used in evaluation studies with quite specific and limited objectives 
as In the determination of the most effective ways to teach 



multiplication or substraction or to acquire .clearly delineated 
psychcmotor skills, 

b. Laboratory schools within colleges of education afford 
opportunities for experimentation in which products can be evaluated 
In a relatively reliable and valid manner. 

/ c. Some research designs such as the multiple time series 

or regression discontinuity analysis described by Campbell and Stanley 
afford a basis for drawing causal Inferences especially In relation 
to product-oriented evaluation and for making decisions regarding 
effectiveness of alternative educational programs. 

d. in the laboratory orientation of many colleges of 
education opportunities exist to investigate systematical ly in an 
evaluation setting such important problems as differentia! rates 
of retention and learning# the transfer of learning which, too', often# 
|$ glibly assumed to occur, and the effectiveness of group versus 
individual problem solving endeavor through use of simulation games. 

Need for New Methodoioov G_i ven tb_e__C t FP Hodel 

Despite this rather positive state of affairs which allows one 
to use existing research methodology for product evaluation in a 
.reasonably well controlled setting, there certainly Is, however, a 
need for new methodology given the C!PP approach to evaluation. 
Especially In the context, input, and process forms of evaluation 



\ 



which are often carried out In different social and envi rpranenta! 

contexts, there is a definite need for new methodology, much of which 

£ 

might be- adapted from anthropology, sociology, history, political . 

^ science, and economies. Curriculum specialists and evaluators find 
the distinctions among these three components of context, input, . 

and process evaluation to be somewhat overlapping and at times to 

- ^ 

be somewhat ambiguous or even contradictory. 

The following points may be noted: 

7 

5. Throughout the recycling process of the CIPP Model, which 
involves constant feedback and dynamic modifications in the various 
steps of evaluation and dec! s ton-making, there is the need to 
establish either explicitly or implicity cause and effect 
relationships primarily through narrowing or limiting the number 

- 

of possible alternative hypotheses. The work of Yee and Gage 
' recently reported In. the Psychological Bulletin offers considerable 
promise for establishing possible dl recti on of cause and effect 
among several sets of correlated measures obtained at different 
times. 

2, A pressing need exists to develop a methodology for estab» 
llshing value systems in the selection of objectives and in their 
implementation particularly within the realm of context and Input 
evaluation. After ail, the word ."evaluation 11 in essence contains 
the word "value. 1 * The problem of' setting priorities in the selection 
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of objectives within a social context is particularly essential 
and deserving of serious systematic treatment* Again* methodological 
approaches underlying historical* anthropological* and sociological 
research could be potentially very helpful* 

3* In all phases of the CiPP Model and throughout the whole 
process of decision making* cons i deration needs to be given to the 
ways In which different kinds of avaf^ble I nforr* Ion may be sorted. 
Integrated, and incorporated in the e— s «. slon-makStng process,* Such 
information can be examined in a rela :! yely well controlled simulation 
game or through observation In a rea c day"-to— day school setting 
permitting replicability, Field need to be made of many of 

the procedures suggested in the CIPP > odel , as Bob Hammond of 

i 

Montanans Department of Public Instruction Is doing Just now. He 
is involving people with training In other kinds of discipi i nes, 

' For example, he has one man with a background in banking who. In 
having a rather realistic orientation to cost accounting and cost- 
effectiveness, Is trying out the CIPP Model in father practical 
contexts. Furthermore, field testing will allow-for extensive 
trial and error observation so that one will know how the- various 
components of the CIPP Model will work, 

4. Another underlying methodological concern is the need to 
distinguish between the, role of the evaluator and the decision- 
maker and to ascertain in a given context for evaluation the relative 
degree of independence or overlap of their respective functions. 



5. Once refinements in the methodology for use with the 
CIPP Model are developed, concerted efforts wl 1 1 be required to 
familiarize and train individuals in the use of the model. For 
example, participants In a two-day simulation game held at The 
Ohio State University in conjunction wi Mi an evaluation of the PP-i 
volume last June said that they found the model was very useful 
as a conceptual framework but difficult to Implement because of their 
lack of familiarity wt th it. They necessarily relied on past, experience 
and intuition to make evaluation decisions. Thus, extensive study of 
and experience with the CIPP Model are needed so that one caff use it 
rather automatically without having to ferret through aH of the 
various components* Currently the complex! ties of the model will 
probably prevent Its wide use and application until there is extensive 
in-service training. - 

It Is important to point out that the heuristic properties of 
the CiPP Model for doctoral dissertation research are great indeed * 
even if its use displaces the overworked classical expert mental - 
control group models. Doctoral committees should encourage students 
to do developmental studies even though they may represent a break 
with tradition. As professors of educational research, many In the 
audience could encourage students to do developmental kinds of work 
such as that which could be used for validating the CIPP Model. 



Recgngnervd&t ions 



Within the context of what has been said the following rec mmen- 
datlons may he formulated? 

1. The CIPP Model should ae given extent l ve consideration b~i 

a guide for conceptualizing essential characteristic? of discrepancy 
evaluation activities. 

2. E forts should be made whenever? poss ble to use exist In*' 
research m hodology .in Implement ? ng many of the object ves of e 
CIPP Mode ... aspec* 'ji!y f ■ the ? ■' = .ance of product evaluation. 

3. Mention should be di r* acted toward devising new method logies 
many of which' can be adapted from those of the social sciences.; to 
answer questions raised by application of the CIPP Model. In 
particular, the following recommendations may be set forth; 

a. Concerted effort should be followed throughout the stages 
of context, input, process, and product evaluation to furnish 

XjX • * V 

different kinds of evidence that will make possible formulation of 

r • 

both explicit and implicit inferences regarding possible cause and 
effect relationships. 

b. Systematic efforts should be directed toward the develop- 
ment of a methodology for setting value systems In the selection 

and implementation of objectives as in context and .input evaluation. 

' c. The full process of decision making In reference to the 

cval lability of many kinds of Information should be studied systemat- 
ically in relation to the CIPP evaluation model. 



<L Attention should be given to how the CIPP Model 
can be ac tageously used In account abi li t p tv: dies# for which 
there wilt fee increasing pressures and demands. 

4. f-urther energy needs to be di rected to refine and dif n- 

entiate the education©! and technical role of t a evaluator# s 

\ 

distinct! -n which has not been made entirely cl ar to the satis, 'action 
of a number of persons who attended the PDK-The Ohio State University 
conference last June. 

5. The feasibility of establishing semina s and in-sen c© 
training institutes to give the C1PP Model greater visibility a~!" 

• • f ■ 

util! ty "to the school community should be investigated. 

Summary Evaluation 

All In all# the CiPP Model offers great promise of providing 
both external and internal validity of the evaluation process. 
Certainly the Irntfsl^three steps of context. Input, and process 
evaluation do much to sharpen the thinking of the evaluator who is 
oriented toward product evaluation# because the first three steps 
Indeed afford the monitoring, recycling, and feedback functions 
upon which effective product evaluation depends. The external 
validity# however# is still open to serious concern# especially in 
the accountability studies. Threats to external validity may be due 
most often to a lack of randomization or to the Inability of the 
evaluator to assume a position of power and influence which he 
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might assumes In evaluation studies involving decisions about a multi- 
million so 1 -.sir educational enterprise* Irrespective of the size of the 
evalua"' or ef-'ort or the magnitude of the decision-making processes 

t 

Involved a the CUPP Model probably affords the most comprehensive 
conceptualization of education currently available. The expenditure 
of efforts to develop new and to adopt existing methodologlc for 
obtaining d ialyzing, and interpreting the data which the model 
can gene r ”~: should Increase Its usefulness In the education 

establ i shrrete 
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•VALUATION; NOBLE PROFESSION AND PEDESTRIAN PRACTICE 

Michael Seri van 

* ;e PDK report is* certainly the most compendious and may well 
; -.j most valuable treatise on evaluation In the literature, 
p-,~- ;3 there should be a three-minute silence at this place, for 
tK- almost the last nice, thing I say. But that's just because 
\ : efficient to keep on saying nice things; the brain does not 

s redundant information, 

i ' ‘ 

i think there are significant flaws In I ts .basic conception of 
evaluation as well a's in its practical advice. Here are a few, 

its basic conception excludes crucial paradigms of evaluation: 
for example, the evaluation by historians of Napoleon’s tactics at 
Waterloo. I take a non-educational case, for interest, but there can 
of ourse be educational ones too; suppose you are evaluating the 
schcol system tn Athens. It seems clear that this Is logically 
t-se same kind of enterprise as evaluating the tactics of an on- 
c.- : - g field general or contemporary educational activities. But 
tv i latter can easily and the former can usually not be subsumed 

j ■ . 

under the PDK (ex-CIPP) definition which requires that evaluation 

J 

be "data gathering for future decision making." Now you might, if 
you have a copy of the great work In front of you, think tnat was a 
"ittle unfair of me, because the basic definition of evaluation 
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which they use— "Evaluation Is the process of delineating, obtaining, 
and providing useful information for judging decision alternatives,"— 
does not contain the word "future" at all. But we very quickly find 
from the way it is interpreted that "future" is the key point and 
the implicit definition in the PDK report is data gathering for future 
decision making. As a matter of fact, two pages before the definition 
there Is a page which contains on it nothing but the following eh light* 
ening slogan, in capita! letters, THE PURPOSE OF EVALUATION IS NOT TO 
PROVE BUT : TO IMPROVE. Now that's great for formative evaluation* 
but 1 that is not, of course, the same thing as evaluation In general. 

I don't think one wants to restrict th^ conception of educational 
evaluation to formative evaluation (they actually restrict It further 
than that) o So it seems to m4 a mistake to try and tie It in to data 
input for future decision making. Evaluation suffered for a long 
time from being regarded as simply summatlve, but we don't have to 
swihg so far over as to say it’s never summatlve'. We find many 
cases later on where it's even more obvious that they are thinking 
only of evaluation' as data input for future decision making. I 
think the mistake here is like the mistake of^deflnlng government, 
for political science texts, say, in such a way as to cover only good 
government. What you should do, I believe, is to define government 
as neutrally as you can, and then get into the question of what 
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distinguishes good government from bad government under the heading 
of political philosophy; in this case the philosophy of evaluatlon- 
as-i t-should-be-done by contrast with the definition of evaluation, 
this attempt to use a persuasive definition is something which 



•greatly affects the whole treatment and I think is unfortunate. 

7 „- 

Another thing that it excludes for example, are evaluations done 
on the basis of instant gestalt-trained judgment. Now in many areas, 
for example in the grading of livestock and veterinary equipment, 

grading and evaluating in a straightforward way goes on and is 

' * 

essentially a perceptual activity. Probably the evaluation of 
students depends heavily on this Instant gestalt person-perception 

activity.' I don't want to exclude that by definition, but it Isn't 

, • u 

* 

evaluative on their account. It's a judgment of worth or merit, 

• *" • • ' . 

and, how it is done, whether that's valid or not. Is a quite separate 



question. 

And sq my first worry then is that to use this particular account 
excludes some important types of evaluation, which .you may wish tq 
condemn as 1 rrelevant Pr • improper but 1 don't think you s would want 
to regard them as not hieing evaluation at all. ' / 

The second problem with their conception is that It Includes 

y 

vast areas of the non-evaluati v£ cognitive domain, e*g» administration 
theory and large parts of data gathering In the educational area, 
a decision which seepis to me to unhealthily dilute 'the notion of , 



evaluation. The notion of evaluation essentially involves the judgment 
of worth or merits Now to do an evaluation you*ve got to go and- 
gather a lot of data first* It's reasonable enough* to say that that’s 
part of the job of an evaluator, but it's confusing to suggest that 
doing it is a kind of evaluating. Since a theoretician also has 
to do just that, you might as well call it theorising, i think 
it's wrong to suggest that most of what they talk about as context 
evaluation is really evaluation, it Is not; U's a market survey. 

After you've done the market survey, which you indeed must do before 
you can do a good evaluatioffi' .then you get started in the business 
of tying the needs you uncover in with performance criteria^of other 
alternatives an4 making judgments of worth and merit and producing ,> 

the evaluation, whether it's formative or summative. But i don't 
think it : s helpful to talk about "context evaluation." it's true -V 
that one part of a market survey may sometimes involve an evaluation 
of competitive products, in the strict sense; but most of it-«often - 
all of it— -is simply a survey of wants. And .that pa rt of. a market 
survey Is anyway not part of what PDK.cal 1 context evaluation. They 
call that "input evaluation." But Input evaluation as a whole might 

-better be called a resources and options survey In which some evaluation 

. ( 4 ■■ 

of the relative merits arid efficiency of those options and 
goes on. But a lot of it is simply survey. 
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Then process evaluation is^^hard to work out- ? these 

V. 

are very, very tricky terms that are defined in six pages of closely 
written material; they’re not defined briefly at al 1 "-process 
evaluation isn’t quite, as I thought it was when I first looked at 
it, essentially formative evaluation. St is mostly monitoring and 
bookkeeping, two of the three elements they identify as process 
evaluation in this report—sl i ghtly changed f rom the standard 
C1PP account of this, I think. It may even include social bookkeeping. 
Now these may be tremendously important for a specific evaluation, they 
may be necessary for a school system’s operation, but they are not 
themselves formative evaluation at all (contrary to Bloom’s unhelpful 
bowdlerizatlon) . They mav be feeding into one. But process evaluation 
in the technical sense they use It, is by and large not formative 
evaluation at all (see page 315). 

Product evaluation, surprisingly enough, turns out to be both 
summative and formative evaluation. I don't want to impose these 
terms of mine. I just mean by them the kinds of evaluation that 
are qsed (a) to improve a developing product, etc., and (b) to 
determine the merits of a completed, unchangeable one. The actual 
process of evaluation~~the nature of evaluation in on© sens^:-- 
is usually the same in both cases, but the role it plays and, in a 
sense, the kind of entity evaluated is different. The feedback 
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loop from formative evaluation is within the project i nfc . .uatiori flow 

chart--! t terminated at a decision maker who controls the next 
R & 0 cycle, or someone who controls him. In summit! ve» the feedback 
is to a consumer , typical ly, or a spectator (a historian, for example) 
or— perhaps— to a judge o‘f the producer who may jje considering 
hiring him for a related job* PDK's mistake is to take the decision- 
making role of many of these users of summati ve evaluation as definitional, 
But if the term "decision-maker" has any content at ail, it does not 
include the fan in the stands evaluating a single play, or the academic 
version of him in the history department* (If you do call these 
"dec! s ion-make rsa" e.g, because they "decide” on what judgment to 
make, or because there are some actions of theirs that^are affected 
by large numbers of these little evaluations, then you have totally 
diluted the PDK definition, since eve ryons is now always a decision- 
maker and the process of obtai ni ng . I nformation for these decisions 
includes every kind of observation and ref lection —in short, 
at! cognitive processes are evaluation* The- trivial or profound 
sense in which this' is true eliminates the posslbi 1 J £y of any theory 
of evaluation In the seus^ which PDK undertakes, since they do not 
^nclude the'whole of cognitive ,and creative psychology,) 

/ In these terms, what is "product evaluation" as the term is 
used in the PDK report? They frequently say thing? Hike this: "This 
is what has traditionally often been thought of as all there Is to 
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evaluation." So you think to yourself summati ve evaluations, in my 

terminology^ but It turns out in the fine print that that's not 
true at all, since they insist that product evaluation itself must 
be part of the decision-making process; and the only kind of evalution 
of which that's true is formative. So, it seems to me, these 
categories are rather misleadingly referred to as four types of 
evaluation. I don't think that’s the best way to put it, not that 
there's a s harp line between "gathering data as a basis for making 
an evaluation" and "evaluation," but just because the CIPP approach 
falls to demarcate the data claims and the value claims themselves. 

T© make a much toucher and more pedago§ica! claim about the 
CIPP analysis, it seems to me about the most complicated and 
confusing wsy of analysing the practical procedures of evaluation that 
1 can Imagine* and It's certainly the most complicated on© that 
I've ever seen. Not only is it impossible that teachers wi,l 1 grasp 
this, or that school personnel are going to use this without very 
intensive in-service training, but I think It's very doubtful whether 
what they'll do after substantial training will repay the cost of 
the training, I don't think we have to say that this Is such a 
complicated subject, as relativity theory where that situation would 
not b : surprising. I suspect-** to. put it pragmatical ly— -that 
ten to one condensation of CSPP would gain so much in teachability 
that any distortions introduced would be overshadowed by improved 



comprehension, 5 think PDK hi.ve *1 duty to try this, or at least 
to bfet that It can't be donw, in which case I'll try It, l think 
it's terribly Important to do tl j. The less jargon we can get by 
with the better; let's junk "(formative and "summatlve" and all 
these other terms*. "Instrumental" and "consequential" and so on* 
along with funny terms like context evaluation* and let's see If 
we can produce equally good evaluators in less time without them* 
or better evaluators In the same time, 

■1 would like to go Into details about the definitions of these 
terms* but time Is short* so let me instead try to make some practical 
points. The first practical point is that tttese arguments about 
definitions are not "mere semantic' issues" at all. Many programs are 
not getting adequate funds for' evaluation* biit those who run the 
programs and those who run the evaluations often have comp? tely 
different ideas -as to what they ' re' supposed to ddwith those funds,- 
And I don't think the' PDK report is going to do enough to reduce 

this confusion because it includes too much and also excludes some 

.. ‘ 1 ■' 

aspects of responsible evaluatlpn as I see it. There have already 

. \ 

been cases where the granting agency terminated the evaluation 
contract because there was "not enough dafca-gatherl ng" going on, 
although no case was made that the required further data was necessary 
for evaluation of the program in qcastion, PDK is quite clear 
.about this kind of point at times. At one place towards the end 
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they say s i n effect, Don't fool around gathering detailed cost data 

if cost is not an object (in an experimental program, for example). 

Don’t was to all that resource and time and effort and thought. All 

right, but If you take that seriously, then you’ve got to be much 

more precise about the distinction between general social bookkeeping 

or monitoring, and getting precisely and only those pieces of 

information that you've got to have for the evaluation. In particular, 

it should be clear to everybody remotely concerned with evaluation 

that one does not have to know a single damn thing about what is - 

going oil In an educational process in order to know that the 

project or method or process has completely failed, completely 

succeeded, or come somewhere In between. This is not always, true, 

and that is why I say that the crucial point to understand is that 

the evaluation may require absolutely no knowledge of what went on 

between Day One and Last Day. Naturally, if there are process criteria 

* * „ 
in the criteria of achievement, you’ll have to look at process. 

^ If, .for example, you think it’s important that the classroom be run 
democratically, you'll have to looj/ in the classroom. But If 
you are using retention criteria you don't need process data 
(except to identify the occurrence of the experimental variable 
in the experimental group and its absence in the control group (s)). 

But USOE isn’t too clear about this; at times they feel that 
if you’re not collecting lots of data, you're not doing a 
decent evaluation. Even for formative evaluation. 
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thss isn’t true, where you often do went to know which features 
of an educational package were responsible for its success. To the 
extent you do this, you are going beyond evaluation in the straight- 
forward sense. Evaluation of a package is simply doing what it takes 
to decide if the package did any good . Deciding what effects it had 
without using the language of merit is both a narrower and a broader 
enterprise than evaluation; and deciding what it was hi the package 
of merit is, of course, an analytical enterprise of considerable 
complexity, properly called educational research but not— except 
by association-evaluation,, Even deciding what the package was , 
that is, settling on an appropriate description for it, is process 
research. It won't do to argue that these things are all sharply 
separate— ►they* re not, it won't even do to argue that they 
should all be made, as separate as possible. For example, one of the 
most useful kinds of evaluation is the radical comparison; to take 
an Imaginary example, one might assert that the talking page device 
does a good job of teaching certain vocabularies, but no better 
than the same page without the expensive talking feature at 1/9 
the cost* That’s useful evaluation. It’s useful in the PDK sense; 
it feeds Into a future decision. It Is much more useful than just 

>- r 

saying that the talking page does agreat job. The talking page Is 
& damned expensive item and the question that's important is whether 
that’s where you ought to put your money. Now "radical comparisons" 
require some analytical comparatly^f ractionatlng research. So 



I'm not arguing for a sharp distinction,, but I am arguing for 
distinguishing whenever in fact the di sti nr' " on doesn't co: y° u v 

because it will save you a great deal of money t© make the distinction 
unless you actually have to run the evaluation snto something 
else# For example, Sam Ball ran up the bill for evaluating Sesame 
Street quite a bit by getting into some research questions about which 
parameters controlled what variance® interesting, useful, probably 
justifiable— but not in ‘the guise of evaluation® 

I orefcty nervous about the rather casual way in which the 
PDK team dismiss what they refer to as the classical experimental 
design model for evaluation. They do not give detailed reasons for 
this. They say things like "what you need to evaluate a pupil may 
not be what you need to evaluate the utility of a program." Now 
that may well be true. But you need to get down to cases and say 
when .it's true (if it is true) and what general conclusions can be 
drawn from that. I myself feel that the strength of the classical 
comparative type of evaluation we've always known is still ve y much 
greater than the suggestions by Cronbach and PDK would have us 
believe. ' . 

Turning now to a more serious point; this conception of evaluation 
has grave professional consequences for us, among them the el imi na- 
tion of fundamental criticisms of the client's objectives, since 
these are accepted as the axioms for the study if so presented 



by a-.- c*«ent* That is 8 unless he comes end asks you to help with 
these, to a large extent they are accepted# J (Page references on 
this Important critic! sm-rand there a*e times when they jump the 
other way**" include the following:; 489 (a good glear one), 183 
327, 410, 387, 411 , 414, 419, and 422.) In my view, even when the' 
client does not want his criteria criticized, the responsible 
evaluator is completely obi i gated— contrary to almost everything 
the PDK reports suggest except In a few places at the end — to 
subject them to the most minute scrutiny both before and after accepting 
the job# For these criteria may and frequently do contain unsuspected 
and well concealed anti-social, anti-personnel, impractical, or incon- 
sistent assumptions, not to mention false and unclear ones# This 

> 

fatal error of orientation shows up throughout the report, and it 
has deep philosophical foundations, as we see on page 41: "Selection 

of criteria, always Implies some value system, and values are essentially 

arbitrary even if not unreasoned*" Mow the cat's out of the bag* 

* > 

The fundamental approach of PDK is that value judgments are essentially 
arbitrary; and therefore you're not entitled to criticize those of 
the client. That seems to me a terrible position to adopt# It 
seems to condone an abrogation of the respons 1 bll 1 ty of the evaluator 
and I think it's precisely this philosophy that Jed to a lot of crap 
masquerading as evaluation that surrounded Title 1# don't think 
that it's arbitrary to conclude that Title I money was systematically 
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jsed throughout the South. it Isn't arbitrary to reject a 

c. ent's goals if they involve the misuse of funds appropriated by 
Congress for helping the needy. It Is obligatory. The doctrine of 
equal rights in not, in my view, arbitrary but the most rational 
social strategy. Quite certainly, most evaluation today is simply 
naive if it supposes the client incapable of conscious or unconscious 
fraud on the government or the students or the parents or the 
teachers, and move one for an evaluator is looking into this possibility. 
(Hove two is having someone else look into the possibility -hat. ha Is 
himself putting one over.) 

/ So it is not incidental but crucial, that the evaluator is a 
conscience as well as a consultant, an auditor as well as an adviser. 
Whether in the formative or the summative role, he Is worth little 
if he lo§es his independence. In summative evaluation it must be 
he who signs the report and edits it and In certain circumstances 
gets it published if the. client refuses to publish It. (There Is 
an echo of sensitivity to that point on page 301.) Otherwise ha 
connives at preventable fraud. This Is not a mere technical adviser's 
role, it is an autonomous professional role with a code of professional 
ethics attached, related In many ways to the auc* tor's role. 

An amusing antecedent to this is to be found in medieval 
Japan, where great families depended for their livelihood and 
honor upon their leaders' skill and reputation as a sword-evaluator. 




42 



41 



Their markon a blade was a jealously guarded, hard-earned, and 

Indispensable adjunct to the sword-maker's own signature. It was 
completely Independent: whole families, generation after generation, 

depended absolutely upon the integrity of that signature. As a 
matter of Interest, these early evaluators were strong supporters 
of thd behavioral objectives approach. Criterion performance was ^ 
checked out by the use of prisoners from the nearest jail, and the 
testers's signs on the blade Indicated the "severage quotient" 
rather precisely. (Records do not show whether allowance was made 
-or Inter-subject variance.) 

Well, this notion of independence Is not clear enough in PDK, 
it seems to me. The very elaborate attempt, from page 470 on, to 
evaluate their own report, using their theory of evaluation, strikes 
one as a bit odd. Of course, we all do our own formative evaluation 
of everything we wrl)te» albeit not very well, but we can*t possibly 
claim to do our own summatlve evaluation. The swordsmiths had the 
point right. They didn't sign the blade twice, the second time as 
an evaluator. They broke it If it wasn't a good blade. If they put 
it out over their signature, that meant that they thought It was ^ 
good blade. That's what one expects from reports, it's Implicit in 
them. Not much can be gained from a second section which says 
"OK, now have we done well? ¥es^ ws have done wel 1 1 think 
one just has to face the fact that there is a role of evaluation which 
it's just not sensible to suppose that you can do by yourself. 



The practical Issues connected with this point are numerous „ 

For example, the institutional contamination of evaluators who are 

«r* ' 

on the staff of school districts, projects or state departments,- 
may make the PDK model for the use of evaluators in the educational 
system not quite viable,. I do not think they examine with sufficient 
care the possibility that the role for such personnel is yery tricky, 
and that much or some of their work Is better handled by the use of 
outside consultants® 

On a related point; it may be fatal to fragment the role of 

^ r 

evaluation as C!PP and PDK do in their team approach. It seems to me 
they finish up without either an evaluator or an evaluation team«-' N They 

t 

break the task out into the statistician, the administrator, the public 
relations, man, and io^pn. But you look at this list and wonder how 
this group is going to get together and produce an evaluation, 
it's not clear what they've got In mind, and I certainly don't 
think you want' to add a moralist to the pack. But 1 do '"think they 

' * Vo 

underestimate the role of axiological or value-analysis training for 
at least some of the team. 

In conclusion, two minor po' is, A dimension of evaluation that 
I find of great importance In practice is the form of presentation of 
the evaluation' 1 tsOlf* Quite commonly, compression— ■> for examp ie-- 
Ss a merit that is. hard to achieve and most important for dissent! nation a 
(This point is surely one of the reasons the letter-grade dies so Hard.) 
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NOW 5 want to make a correction of the handout version, tome of you 
have, which 'was written for ERIC some time ago, from my v notes without 
PDK's report to look at, -'it Is simply incorrect as it stands. The 
PDK report does di scuss mundane matters such as presentation, and indeed 

' 1 */ - V * 

7 ' y* . ' 

I am very glad they do. Something it does not do is to consider 
treating alternative forms of presentation as educational products 

, l . • ' 1 

themselves, and hence appropriate for evaluation in an expe» imental .. 
way, not by you as the author sitting there and saying you think ■ 

■ * ' . i 

they're good (having just written them), but by running a set. of tests 
on them. So, for example, in coming up with the CIPP outline itself, 

j .. ' ' » . . 

why not treat thatsix pages or ten pages' as being an educational 
product which should be evaluated? Perhaps in the workshop this 

. ■ *, • i -•-> 

was done, but it's hard for me to believe. To do it, right, you get 
a naive graduate student fromi mathematics to read it, make what the' 

hell he can out of it, summarize it in ten. lines, and then try that 

summary as your control and see whether the teachers, do better or 
worse from it. That would be a beginning on the "radical comparison" 

evaluation. . ... ' • 

Finally, perhaps the best of all the practices of -a good 
• * * ^ ’ 
evaluator is getting the benefit of criticism from -the other side. 

It goes very hard, on the ogo at times* but it*s the mark of the 

professional to do it.’ I want to conclude by congratulating the 

PDK team for their willingness to invite the expression of deviant” 



viewpoints, not only in the use of an 1 1-man review pane? in She 

formative stage but also in the use of this panel in the summati ve 0 

^ j *m not sure they expected my viewpoint to be this ant 9 but 
arranging tfie panel was in the best tradition of a quite noble 
profession, and I salute them for it* ^ 

A last footnote: 1 think the list of specialties from which 

evaluators can learn, which they give on page 455# a useful kind of 
suggestion, could be supplemented somewhat. They say that evaluation 
specialists can receive "aid in comprehending, thei r work from three 
reast general systems theory, economics# and political science* 

The first is important to the evaluator as he considers the structure 
of phenomena on which decisions focus. The second, economics, sheds 

\ V- 

light' on the nature of the decisions. The third, political science, 
contains constructs helpful In understanding the process of choosing." 

I would add these: (i) radical political theory— *i n 'order to -ae 

the other options in what they call input evaluation, (it) ethics 
or value theory, never discussed by them — very important. They - 
mention them oriee in one line very near the end, but they deserve 

** ‘ I . 

better; (ill) Historiography--! t never occurs to them that the 
analysis of historical material for evalu ative purposes may have, 

something. important for us to learn from, and of course it undermines 

' ' ** . 

their definition of evaluation; (iv) accounting, a relatively 

unsophisticated approach to cost accounting procedures, cost-effect! venes 
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analyst®., is really important In some evaluations^ (v) 

language analysis, in all their lists of specialties they never see 
that the skill of congruence identification between the description 
of goals and the items in the pool of tests is a : ’ill 1 In and of 
Itself that is absolutely transcendent over statistical ski 1 Is, 

test construction skills of the ordinary kind » and is the death of. 

\ 

more test construction endeavors then anything else. 

Well, one o£/the last things they say is "Nothing is worse 
than destructive criticism. 11 I don't agree at all. Silence is 
rvjch worse* I have at least avoided that « 



\ 
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DETERMINING 8 'MOST PROBABLE” CAUSES; A CALL FOR 
RE-EXAMINING EVALUATION METHODOLOGY 

James L.\lVardrop 

It is illustrative of the scope, of the PDK monograph that I 
feel after listening to these four expert reactions that have gone 
before, that I still have something to say. 1 have a feeling that 
!f the members of the PDK Committee made a serious attempt to 
reconcile their presentation wit 8 ' our reactions , they would work 
for at least twenty years before coming out with a new one. To 
set the stage for my remarks (which Bill Michael has already 
alluded to), I should point out that one thing I have seen happening 
is people doing stvjwi-Iss of r sort which would have to be considered 
as poor research and attempting to justify them undPr the heading 
of ••evaluation,” 1 would also like to say before i get into my 
substan^j|ve comments that although what' I say might seem to imply 
that I am^accept 1 ng the PDK definition of evaluation and the CIPP 

approach, this is not strictly true, I have used thel r def ini tlon 

, ( 1 

and the CIPP approach because it provides me with a foil for my 

arguments. In this respect, then, I may not be quite fair to the 

« ) 

members of the Committee. ( 

... riy^TsraJbr point can be stated quite succinctly; namely, a 
‘central focus of educational evaluation Is explanation (or^, wore 
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precisely,, the selection among several possible alternative explana^ 

I \ 

1 1 ons ) * - \ 

Once I have stated this thesis, the really hard work begins. 

I have now obligated myself to do three things: first, to explicate 

this succinct statement and attempt to give it substance? second, to 
Justify the assertion I have made; eri^d finally, to Indicate in some . 
way how my thesis may be viewed as a Reaction to the P*. ‘ > .eriaSs. 

If ! succeed in discharging any one of\ these obligations, my day will 

\ - 

have been an unprecedented success, \ 

\ 

A Notion of Causality 

I circulated an earlier version of ^hesc? comments to a number 
of friends, colleagues, and acquaintances (a few people actually fell; 
into more than one of these three categories), A gratifying number 
of th^se people reacted. Now on the basis of those reactions I 
am convinced of two things; first, i had come up with the best 
projecti'Ve technique for educational evaluators yet devised, I ,*» 



won 



t take the time to share with you those "project! ve" reactions 



right now, but let me say that if 5 wer to give you the list of 
names of those people who reacted and another list of the reactions, 
you would not have a very difficult time matching them. Additionally 



r 



S fee! the need to qualify that slightly and "sy that although 
it is the central focus * it isn't the only focus and should -ot ® 
the primary focus of many evaluation studies, £*. 
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many of the reactions i received challenged my statement about the 
centrality of explanation to evaluation, for one primary reason, Sn 
that earlier draft t made the statement.* "Explanation, as it Is used 

-T « 

in this paper, refers to the determination of the most probable 
cause for a phenomenon," To use the word "cause" especially with 
people trained in the social sciences,, Is either naive or foolhardy, 
it’s sort of like sticking one's head into a beehive, - 

Nevertheless, I am going to stand by what I wrote then, 

' (Naivete 1 ' dies herd in me!) One difference, though, is that I am 
going to try to clarify what I mean by "cause" in the context of this 
paper. Er.iest Nagels irk his chapter on "Types of Causal . Explanation 
in Science" in Lerner's book Cause and Effect , has cot.sidefed, among 
other things, what he referred to as "conditionally necessary causes," 
I’m^nb't^ sure that 6 c^n do justice to that notlorr here, but I'll 
make a stab at it. That Is, suppose event F. was observed. When 
E occurred, antecedent conditions^, B, and C were present; (St 
is possible, as Nagel pointed out, that we may be unaware of the 
existence of some or all of these conditions,) The general rule « , 
which applies to this situation might' be stated as follows: 

Glven-that conditions A,B, and C are present, then if condition D 
is also present* event E wl 1 1 occur* while if D is not, present, , E 

i 

will not occur. Since condition 0 fn and of Itself »s not sufficients 
to bring about the occurrence of E and since E may occur under some 
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other circumstances In the absence of 0 3 we may speak of D as a 
contingently necessary cause of event E„ This is precisely the 

notion of causality i had in mind in writing that "explanation — 
the determination of the most probable cause or causes for a 
phenomenon-- i s a central focus of educational evaluation® 

Jt is my contention that, in every type of evaluation presented 
by the PDK Commission, explanation is crucial. Further, I would 
argue that the PDK volume does not adequately treat this concept nor 
does It adequately consider some of the implications of the concept 
for the methodology of evaluation. They have treated it somewhat 
at various points in the monograph, but ho where is it presented in 

i 

detai 1 . 

The Role of Explanation in Evaluation 

in evaluation, as in experimentation, we seek to' rule cut, 
insofar as vj& are able, alternative explanations for phenomena. One 
aspect of context evaluation involves monitoring the system in order 
to identify problems and isolate possible causes of these problems. 
Since the subsequent delineation of. a class of possible change ^ 
strategies is directly (determined by the causes so identified, 
it is vital that the. evaluator be able to provide ? nformatiob of 
such quality as to Insure that the identification of a cause or 
causes has a high probability of being correct. In other words, 
alv rnative explanations for the observed phenomenon or problem 
must, be shown to be unlikely® 

50 
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In I nput evaluation, also, xhe Issue Is one of explanation,, tne 
attribution of causality* For example, If wa do something., A, then 

i 

X wil 1 be more likely to occur than if we do B or Cc l«e«, A is a 
more probable cause- "as cause was defined earl f a ban are 
9 and C. Once again, the decision (to do A, or B, or C) determines 
how and where and to what extent we are going to invest oar resources* 
The ruling out of-~or the ass-ignment of low probabilities to-“ 
alternate explanations Is critical* 



One major focus of proces s evaluation is upon the early 

I 

identification and removal of barriers to the success of the particular 
program selected to Implement the change strategy. As before, we 
are faced with the need for valid explanations. To call something 
a "barrier to success" is to make a causal inference of the fonrij^---' 
if <l 8 then probably not X. That is, the occurrence of (existence of) 

Q. reduces the likelihood that X wil t occur (increases the Hkehood 
that "not X" will occur)* Solving the problems of barriers Is In 
this way formally equivalent to making the kinds of selection decisions 

jj 

which 4 nput evaluation s/rves, with the same implications relative 
to the attribution of causality. 

Finally, product evaluation can be thought of as representing 
the effort toward final verification of the web of explanations which 
has preceded it* If the causal relationships postulated earlier 
have beers correct (If the explanations have been Vil id) „ then the 
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hoped-for (intended) outcomes wi 1 1 occur®^ it is in connection v,u tn 

product evaluation that we most often bring to bear the wee f th of 

* 

Inferential statistical methods, apply our principles of experimental 
design* .and in general call up our methodological "big guns." The 
concern in this paper Is that we cannot afford to wait until this 
final stage to provide a sound methodoloc cal base for causal 
Inference*, The methodology of experimental --sign and traditional 
statistical techniques may not — and probably are not—appropr iate 

throughout the evaluation process, but some methodologies must be 

J \ 

employed' which will provide us with a sound basis for our explanations. 
The Search for Methodology 

The preceding paragraphs have made a' case for the centrality of 
"explanation" to evaluation as it is 'represented In the CIPP approach* 
On the basis of those arguments, one must conclude that the ruling out 
of (or assigning low probabilities to) alternative exp ] ansi ions — 
o r at least providing data upon which to base such decisions 
about alternative exp lenat lone--! s an important aspect of- 
evaluation. * - 

While the distinction between research and evaluation is Important 
and needs to be emphasized (as the POK; authors have done) , 1 have 

2 ' ' • v 

it is appropriate at this point to remind ourselves that 

'other, unintended outcomes will also occur® 



some fear that a preoccupation with the different set ion may lead to 
an overly casual attitude on the part of sor*.e evaluators toward the 
quality of the Information ora which explanations produced within the 
evaluation setting are based* Threats to internal asid-“in some 
I nstences--e:<ternal validity must receive extensive attention* 

If anything, they are even more important in an evaluation setting-- 
where decisions (based on chains of causal inferences) determine 
the alloc tlon of precious resources to a considerable degree— 
thar they are In most research (especially basic research) settings. 
If a researcher commits a Type J error, he (or -other researchers) may 
pursue an inappropriate question until the error is discovered and 
corrected* On the other hartd* the possible consequences of an 
evaluator (or decision maker on the basis of information provided 
by the evaluator) committing the analogous kind of error are muc 
more immediately felt in the resulting mi sal location of resources* 
The traditional model for educational research derives to a 
great extent from agricultural experimentation, after beir itered 
through experimental psychology* In his efforts to provide valid 
Information on which to base explanations, the evaluator will often 

t 

find this exist! ng methodology both ine-Squate and inappropriate* 
in such circumstances, there are at least two alternatives to be . 
considered* As a first step, we can S’aek methodologies for arriving 
at valid exp lanat tens which have been successfully utilized in othnr 
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disciplines such as sociology, economics, anthropology, history, and 
so forth. Tom Hastings, a couple of years ago ;n his prestdenisae 
addrp' - :o NCME, addressed himself to this issue. A second alternative, 
^ ; inadequacies In methodology have been identified, is to set out 

to develop new approaches for gathering and analyzing Information, in 
order to minimize the probability that alternative explanations are 
In fact correct. • . 

Identifying Methodological Needs 

In the preceding section, 1 pointed out very generally a task for 
evaluation inethodologi sts. One essential aspect of that task is the., 
identification evaluation activities for which exi sti ng methodology 
is inadequate. Through an emphasis bn the underlying search for 
causality, we should be able readily to identify many of those 
inadequacies. This approach leads directly to a concern for the ^ 
nature of evidence . What kinds of evidence will best enable the 
1 " evaluator (or decision maker) to confidently discard alternative 

explanations as implausible? How can the evidence the evaluator 
collects best be communicated to the decision maker? There is- 
also a question here and the adequacy of the explanation „ One 






■Hastings, J 0 T. “The Kith and Kin of Educational Measurers," 
JoumaKof Educational Measurement , 1969, 6, 127-130. 
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'way of viewing that, is to say that the e xp \ at v .§• 

th P ^r.intent of that explanation Is satisfied, I think ws 

need to distinguish explanation _ In evaluation f ;m explanation 

* # ’ / 

In research at least partially on this basis. As an i 1 lustration. 

In a paper session on curriculum evaluation, one of the presenters 
discussed an evaluation of an approach to preventing dropouts 
in high school. In the course of the evaluation, he collected data 
on attendance rates for those students in the experimental program., 
and students in some of the control groups. Having first noted 
that attendance was much better in the experimental setting, he went 
on to offer a couple of possible explanations. First, It was a 
work-study program and If- a student did not go to school, on a 
particular day he could not work that day (and therefore wouldn't 
be paid for that day ? s work) , Secondly, any time a student was 
absent, somebody Immediately called hi s parents —ei ther at home 
! or at work — to find out what was wrxmg, After the presentation 
somebody In the audience got up and sald s "Wouldn't -It be nice if 
we could get better Information out of this kind of situation by 
designing a little experiment In which we have perhaps a two -by- two 
factorial design Involving these two tactics. If you think cjf 400 
students randomly assigned according to conditions, one group whose 
absence was handle th the phone call and not being permitted 

to work, another group only by. the phone call, a thi rd group only 
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by permit Ing them not to work, and the fourth group nothing, ae"ba 
W a could learn something about which variable— not being permitted 

to work, or the phone call to parents-- *is the Important one," 

The reaction was 2 "That .might be .nice, but I was an external evaluator 
for this project and St wasn't my place to redesign the project In 

order to serve this kind of research need. For the purpose of the 

> - 

evaluation it was sufficient to show that what was being done J n 

this project did in fact have fchd effect of reducing absence," 

Without going further,' one may ask which of these two factors Is more 

likely to be the cause? At this point both of them were operating 

within the program, the evaluator felt that he did not have the right 

—or the need— to intrude into the operation of the program to 

the extent of suggesting, this kind of more traditional research 

design. For the purposes ov the evaluation, the explanation was 

adequate. For purposes of acqui rl ng general i zable, scientific 
/' , • *' 
knowledge , it 'was not. 

Given the position of the PDK Committee that evaluation serves 

t - 

the decision make.r, other very Important questions arises for example, 
what kinds of evidence is the decision maker willing to accept 
as bases for his . Inferences? Another question iss Are these the kinds 

; p • ' ■ . 

of evidence he should (accord! ng , to some criteria) accept?, . The 
hope fs that there is some commonality among decision makers in 
termf of the kinds of evidence they are willing to accept, that 
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the answer* to this question does not depend entirely upon the sdiO“ 
syncracies of the 5 ndi vidual deei Sion makers , that given certain . 
decision settings and dec,! si on types* decision makers -in common 
tend to seek certain kinds of evidence. Answering the "should" - 
question will take much hard* logical thinking and— probably*— 
years of Investigation iivan effort to validate the outcomes of 
that thinking, > , 

t v. *■ 

Summary 

Let me summarize now and say that if properly carried out, 
then, the task of the evaluator is in some ways much more difficult 
than that of the researches First, the evaluator finds typically 
himself working in naturalistic settings, settings In which many 

uncontrol led— and often uncontrol lable- -sources of variation 

, / 

are operating. He is placed In the. post tlon of seeking consistent. 

* 1 * - % * * 

covariation over time and over context, such covariation to be 

* * 4 , 

Important datum for his. attempts at inferential explanation, 

1 r 4 4 

Because the consequences of decisions based on evaluation data 
have considerable implication for (and effect on) the allocation 
of resources. It Is imperative that gaps existing evaluation 
a methodology be identified and some of those resources allocated 
to. closing the gaps. 
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You probably have msdd an Inference about my comments /by 

. '' *, ' \ ' 1 " : > . , . - 

now. one I would like to reinforce. (You have probably made several 

other- Inferences I would rather not ~rei nforee, also.) Namely, I 

•• ' ' ' ' % t . " ' ’* : * 

do "not have any panaceas; I am not even sure where the answers will 

. • ' H ‘ - • . ‘ * • 

come fromK But I would like to see more people spending time 

, ‘ N 1 ' ' ' , ‘ ' - 

worrying the issue of evidence, explanation, -and causal! ty In 
'educational evaluation. 
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